-
Notifications
You must be signed in to change notification settings - Fork 5.5k
feat(native): Add endpoint for expression optimization in sidecar #26475
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry @pramodsatya, your pull request is larger than the review limit of 150000 diff characters
d3f5a74 to
4224f36
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry @pramodsatya, your pull request is larger than the review limit of 150000 diff characters
|
@aditi-pandit, @tdcmeehan, could you please take a look? |
presto-native-execution/presto_cpp/main/types/VeloxToPrestoExpr.cpp
Outdated
Show resolved
Hide resolved
presto-native-execution/presto_cpp/main/types/VeloxToPrestoExpr.h
Outdated
Show resolved
Hide resolved
presto-native-execution/presto_cpp/main/types/VeloxToPrestoExpr.cpp
Outdated
Show resolved
Hide resolved
presto-native-execution/presto_cpp/main/types/VeloxToPrestoExpr.cpp
Outdated
Show resolved
Hide resolved
presto-native-execution/presto_cpp/main/types/VeloxToPrestoExpr.cpp
Outdated
Show resolved
Hide resolved
8034891 to
69c7c37
Compare
aditi-pandit
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @pramodsatya. Have bunch of comments.
presto-native-execution/presto_cpp/main/types/tests/VeloxToPrestoExprTest.cpp
Outdated
Show resolved
Hide resolved
69c7c37 to
388eb89
Compare
|
Thanks @aditi-pandit, made the changes as suggested. Could you PTAL? |
388eb89 to
dfb9a2b
Compare
| constexpr char const* kOptimizerLevelHeader = | ||
| "X-Presto-Expression-Optimizer-Level"; | ||
|
|
||
| velox::core::TypedExprPtr rewriteInExpression( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pramodsatya : I'm confused about this. This is registered as a regular rewrite with the Velox optimization. So then its going to happen post Presto -> Velox expression conversion. If Presto -> Velox expression conversion can generate expressions with this IN expression format then how was Velox evaluating them in the past ? If it was missing this structural change then we can add it in Velox itself.. Conversely, if the Presto -> Velox expression conversion generates IN with subfield filters then we this optimization would never be triggered anyway. Don't follow the reasoning to have this in the Prestissimo layer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @aditi-pandit, I added this here because the resolved Velox function was throwing an error for cases where IN-list is not constant. I do not see a similar error in regular execution path and this needs further analysis. Reverted this for now and will move the rewrite to Velox after fixing this error. 4 of the tests that validate this optimization have been disabled and will enable them once Velox rewrite is in.
Could you please take another look?
dfb9a2b to
7ac6fad
Compare
| resultExpression = inputRowExpr; | ||
| } | ||
| protocol::to_json(j, resultExpression); | ||
| VLOG(1) << "Optimized RowExpression: " << j.dump(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Log level 1 seems a bit too low for this detail?
If the resultExpression failed how would the user be helped with this? This assumes the saw the unoptimized version and ow can see it was either failing to optimize or couldn't be converted.
Basically, I'm not clear how useful this VLOG is other than seeing an expression in the log.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The VLOGs were added for ease of debugging in case there is an error during expression evaluation, removed them and added a LOG(ERROR) with info about expression that fails during evaluation, thanks.
| [[fallthrough]]; | ||
| case velox::core::ExprKind::kInput: | ||
| [[fallthrough]]; | ||
| case velox::core::ExprKind::kLambda: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should have a default that issues a VELOX_USER_ERROR about an unsupported expression kind? Are there others that are not covered? Could this lead to false positives?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion, there are no other expression types and I avoided adding a default case here to ensure the conversion logic from TypedExpr to RowExpression will be explicitly added here for any additional ExprKinds that could be introduced in Velox in the future. If a new ExprKind is introduced and the default case returns a nullptr, the VeloxToPrestoExpr conversion could fail for any subexpression, since this function is called recursively by other internal functions of VeloxToPrestoExprConverter (eg: functions that convert TypedExpr to SpecialFormExpression).
If a VELOX_USER_ERROR is issued here when conversion fails, the endpoint will not return a response so chose to return a nullptr instead and return an unoptimized RowExpression from the endpoint, is that fine?
Also since the VeloxToPrestoExpr conversion is tightly tied to PrestoToVeloxExpr conversion, the e2e Presto expression tests and the Presto expression fuzzer (to be added in a follow-up) would help detect expression conversion failures that result from any changes to PrestoToVeloxExpr.
Alternatively, the input RowExpression can be passed as an argument to getRowExpression and the default case could be to just return the input RowExpression in case VeloxToPrestoExpr conversion fails.
Update: Modified VeloxToPrestoExprConverter::getRowExpression to take in an optional RowExpression parameter, which would be returned if the expression conversion fails. Please let me know if this is alright.
What do you suggest @czentgr @aditi-pandit ?
7ac6fad to
32bd567
Compare
steveburnett
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! (docs)
Pull branch, local doc build. Thanks!
Co-authored-by: Pramod Satya <[email protected]>
32bd567 to
3e2b5ee
Compare
Description
To support constant folding and consistent semantics between the Presto Java coordinator and the Presto C++ worker, it is necessary to use consistent expression evaluation. To support this, a native expression evaluation endpoint,
v1/expressions, has been added to the Presto C++ sidecar. This endpoint leverages theExprOptimizerin Velox to optimize Presto expressions.The optimized Velox
core::TypedExprreturned by Velox'sExprOptimizeris converted to a Prestoprotocol::RowExpressionin the Presto native sidecar with helper classVeloxToPrestoExprConverter. The end to end flow between the coordinator and sidecar is as follows:If an error is encountered during Velox to Presto expression conversion, it is logged for further analysis and the unoptimized input
RowExpressionis returned instead. With the fuzzer testing (see test plan), we expect this endpoint to be ready for production workloads.When the
OptimizerLevelisEVALUATED, the endpoint throws for any error encountered during expression evaluation.Motivation and Context
The
native-sidecar-pluginwill implement theExpressionOptimizerinterface from Presto SPI to utilize thev1/expressionsendpoint on the sidecar for optimizing Presto expressions using the native expression evaluation engine.The primary goal is to achieve consistency between C++ and Java semantics for expression optimization. With this change, C++ functions including UDFs can be used for constant folding of expressions in the Presto planner.
Please refer to RFC-0006.
Test Plan
Tests have been added by abstracting testcases from
TestRowExpressionInterpreterto an interfaceAbstractTestExpressionInterpreter. This test interface is implemented inTestNativeExpressionInterpreterto test thev1/expressionsendpoint on the sidecar end to end.Unit tests for simple expression conversions are also added in
VeloxToPrestoExprConverter.cpp.This feature is still in Beta, and to support production workloads with complete certainty, the Velox expression fuzzer will be extended to test this endpoint with fuzzer generated expressions in a follow-up PR. This should surface any remaining bugs.
Release Notes